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Abstract 

Many multiple testing procedures make use of the p- values from the individual pairs 
of hypothesis tests, and are valid if the p- value statistics are independent and uniformly 
distributed under the null hypotheses. However, it has recently been shown that these 
types of multiple testing procedures are inefficient since such p-values do not depend 
upon all of the available data. This paper provides tools for constructing compound 
p-value statistics, which are those that depend upon all of the available data, but 
still satisfy the conditions of independence and uniformity under the null hypotheses. 
As an example, a class of compound p-value statistics for testing for location shifts 
is developed. It is demonstrated, both analytically and through simulations, that 
multiple testing procedures tend to reject more false null hypotheses when applied 
to these compound p-values rather than the usual p-values, and at the same time still 
guarantee the desired type I error rate control. The compound p- values, in conjunction 
with two different multiple testing methods, are used to analyze a real microarray data 
set. Applying either multiple testing method to the compound p-values, instead of the 
usual p-values, enhances their powers. 

Keywords: Empirical Bayes, False Discovery Rate, Multiple Testing, Multiple Decision Function, 
Multiple Decision Process, Test Data, Training Data, Microarray Analysis. 



1 Introduction 



High throughput technology, such as the microarray, allows for thousands of pairs of hy- 
potheses to be tested simultaneously. The usual strategy, when testing a single pair of 
hypotheses, is to maximize the probability of correctly rejecting a null hypothesis while at 
the same time ensuring that the probability of erroneously rejecting the null hypothesis, the 
type I error rate, is controlled at some prespecified level. However, when testing M > 1 pairs 
of hypotheses simultaneously, an additional layer of complexity arises. 

* Corresponding author: jhabige@okstate.edu. Department of Statistics, Oklahoma State University, 301 
MSCS building, 74078-1056 

^Department of Statistics, 216 LeConte College, University of South Carolina, 29208 



1 



Simply controlling the type I error rate at level a for each individual test can lead 
to an unpalatable number of type I errors, especially when M is large. To combat this 
phenomenon, a multiple testing procedure can be used to control a globally defined error 
rate, such as the Family Wise Error Rate (FWER), which is the probability of committing 
one or more type I errors, or the False Discovery Rate (FDR), defined as the expected 



proportion of type I errors among rejected null hypotheses. 
other global typ e I error rates see iBeniamini and Hochbergl ( 



( 2007 ). See also IWestfall and YouneJ ( 119931 ): 



or a discussion of these and 



1995); 



Storevi (120021): 



Dudoit and van der Laan 



( I2008h : 



Sarkar 



Dudoit et al. 



( 120031 ) for a comprehensive review of multiple testing methods. 

Many multiple testing procedures have been developed based on the premise that data 
X m for testing the null hypothesis H m0 against the alternative hypothesis H m \ has been "ef- 



2000h : IBeniamini et all (2006) 



Holm ( 



1979 



Storey et al 



Efronl (12008 



Hommell ( 11988 



Genovese and Wassermanl ( 2004 , 



Hochbergl (119881 ): ISimed ( 119861 ): 



2006 



Sidakl (1967); 



1C 


m = 


( 


1995 



Genovese et al. 



( 20041) make use of t h e p- value sta t istics , while methods in 



Storey 



Efron et al 



1|2006|); 



( 2002 ): 



feooil); 



Sun and Cail ( 120071 ); |Jin and Cail (120071 ) make use of Z- value statistics, which 



are transformed test statistics that have a standard normal distribution under the null hy- 
potheses. 

This paper provides an answer to the question: "How can test statistics for these mul- 
tiple testing procedures be computed in a more efficient manner, yet still allow for the 
procedures to be valid?" Since many multiple testing procedures depend upon the p-value 
statistics, and are valid if they are mutually independent and uniformly distributed un- 
der the null hypotheses, we focus on p-value statistics satisfying these independence and 
uniformity conditions. In particular, we provide tools for constructing compound p-value 
statistics, which are those that depend upon all of the available data X = (X l5 X 2 , . . . , X M ) 
via Pi(X), P 2 (X), . . . , P M (X), that are independent and uniformly distributed under the 
null hypotheses. As an example, we develop compound p-value statistics for testing for 
shifts in location, and show that they satisfy the uniformity and independence conditions. It 
is shown analytically and through simulations that multiple testing procedures will remain 
valid and tend to reject more false null hypotheses when applied to these compound p- values, 
instead of the usual simple p-values, defined via Pi(Xi), P 2 (X 2 ), . . . , Pm(Xm)- 

This paper proceeds as follows. In Section 2, we present the mathematical framework and 
results that connect compound p-value statistics to compound decision functions. Section 3 
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utilizes sample-splitting ideas from I Cox and Hinkleyi (119741 ) and lRubin et al.l ( 120061 ). as well 



as results from Section 2, to develop a method for constructing compound p- value statistics 
that satisfy the independence and uniformity conditions. Shrinkage estimators and results 
from Sections 2 and 3 are used to develop a class of compound p-value statistics for testing 
for location shifts in Section 4. In Section 5, it is shown analytically and through simulation 
that the proposed compound p-value statistics, when compared to the usual simple p-value 
statistics, will lead to more powerful multiple testing procedures. Methods are also compared 
to some other compound multiple testing procedures. Compound and simple p-values, along 
with two different multiple testing procedures, are used to analyze a real microarray data 
set in Section 6. The compound p-values allow for substantially many more rejected null 
hypotheses. Some concluding remarks are in Section 7. To make this paper more readable, 
all proofs of the theorems are gathered in the Appendix. 

2 Framework and Results 



In thi s sect ion, we present th e basic framework, which was also considered in iPena et al. 



(120 111 ) and lHabiger and Penal (120111 ). and establish some fundamental results that will be 
useful for developing compound p-value statistics. Objects of main interest to us will be 
a random M x N matrix of observables X = (X mn ,m £ Ai,n £ A") £ X with Ai = 
{1, 2, M} and M = {1,2,..., N}. Each X mn need not also be 1-dimensional. To refer to a 
portion of the matrix, we denote by X[A, B] = (X mn : m £ A,n £ B). To refer to a set of 
columns indexed by B £ A/", we write X[M,B] = X[,B] and likewise write X[A, ] to refer 
to a set of rows. If referring to a single column, say column n, we write X[, {n}] = X[,n]. 
Similarly, we write X[m, } to refer to data in row m. To refer to an element of a matrix, we 
write X[m,n]. 

The distribution function of X is represented by F . The collection of possible distribution 
functions J 7 , sometimes called a model for X, will need to be specified, such as in Model [TJ 

Model 1 LetX ~ F £ F N , where J- N = {F : F(x) = Y\ neM G(x[,n\; /j,,^)} and G(-; /x, S) 
is the multivariate normal distribution function with M x 1 mean vector fi and M x M 
covariance matrix S. 

This model, which assumes that columns of X are independent and identically distributed 
according to an M-dimensional multivariate normal distribution, will be considered in more 
detail in Section HJ 
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Pairs of hypotheses to be tested will be specified in terms of the model for the entire 
matrix of data. Let T m Q C J 7 and T m \ C J 7 be null sub-models and alternative sub-models, 
respectively, such that J-" m0 (J T m \ = J 7 and T m Q f] T m \ = 0. The goal is to determine, for 
each m G M, which sub- model F belongs to. This is equivalent to testing the null hypothesis 
H m o : F G J-mo against the alternative hypothesis FL m \ '■ F G .F m i, for each m. 

Each of the M pairs of hypotheses will be tested with either a compound decision 
function, defined 5 m : X — > {0, 1}, or a simple decision function, defined 5 m : X m — » {0, 1}, 
where X[m, ] G X m . The size of S m is defined by 

7} m = sup E F [6 m (X)] 

where E F [S m (X)] is short for E[S m (X)\X ~ F}. Since the size r\ m of 5 m can be specified, we 
write S m (-] r) m ). Throughout this paper, it is assum ed that for eyery F G T , r\ m h- > 5{x\ y m ) is 



nondecreasing and right-continuous a.e. [F]. As in lPena et al.l (120111 ) and lHabiger and Pena 
(120 111 ), we refer to this collection of decision functions A m = {S m (X]rj m ) : r] m G [0, 1]} as 
a decision process, and refer to A = (A m ,m G M) as a multiple decision process. 
Further, we say that A m is compound if each S m G A m is compound. 

This stochastic process framework allows for a natural definition of a p- value statistic. 

Definition 1 The p-value statistic for decision process A m = {S m (X;r] m ) : r] m G [0,1]} is 
P Am (X) = mf{ Vm G [0, 1] : 5 m (X;r/ m ) = 1}. 

Given data X — x, P& m (x) is the smallest size allowing for H m0 to be rejected. A p-value 
statistic is said to be compound if it depends on all of the data, and is written P^ m (X). 
A p-value statistic will be called simple if it depends only on X[m, ], and will be written 
PA m (X[m,]). Note that if a decision process is compound, then its corresponding p-value 
statistic will be compound by Definition \T\ while if A m is simple, then its p-value statistic 
will be simple. 

In Theorem [T] below, we see that Definition [T] ensures that a p-value statistic will be 
stochastically greater than or equal to a uniform distribution under the null hypotheses. To 
emphasize that this notion of uniformity depends upon the null model under consideration, 
we say that PA m (X) is J^-uniform if sup F6J r m0 P F (P Am (X) < t m ) = t m for every t m G 
[0, 1], and say that the collection of p-value statistics P^(X) = (P,\ m (X),m G M) is J\m - 
uniform if PA m (X) is J^o-uniform for each m G M.Q, where M.® = {m : F G J- m o\ indexes 
those pairs of hypotheses for which H mQ is true. 

Theorem 1 The collection of p-value statistics Pa (A) = (P& m (X),m G M.) for a multiple 
decision process A is T ^-uniform. 
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Many multiple testing procedures assume that p-value statistics are independent of each 
other under the null hypotheses and independent of p-value statistics from false null hy- 
potheses. It is therefore useful to more formally examine this notion. We say that P^(X) is 
J-^Q-independent if for every F G Tmo an d t = (ti,t 2 , ...,tAf) £ [0, 1] M , 



\m£M / 



pf( n [ p A m (x)<t m ]] 



(1) 



n PF(PA m (x)<t 5 

where .Mi = .M \ M.q. Likewise, the MDP A is Fm -independent if for every F G J-m 
d = (di,d 2 , ...,d M ) G {0, 1} M , and 77 = (771,772, ■■■,Vm) € [0, 1] M , we have 



P F I P| [$m(X;Vm) = dm] I 



p f ( p| [5 m (X;r/ m ) = d m ] J 

\mSA4i / 



(2) 



II P(S m (X;v m )=d, 
jmeMa 

Theorem [2] below states that a collection of p- value statistics satisfy the independence con- 
dition if and only if their corresponding decision processes satisfy the condition. 

Theorem 2 The collection of p-value statistics Pa(X) for a multiple decision process A is 
J- ^-independent if and only if A is J 7 ^-independent. 

This theorem allows us to use Definition [T] and an Fm ~ independent compound multiple 
decision process as a mechanism for defining a collection of independent compound p-value 
statistics. The next section provides some tools for constructing this type of multiple decision 
process. 



3 Data Splitting 

In this section, we will consider splitting one data set into two data sets via X = (Xi,X 2 ) 



which we wil 
considered in 
setting. 



refe r to a s training data and test data, respectively. This idea was first 



Cox ( 



Rubin et al. 



9751) f or testing a single pair of hypotheses in the normal distribution 



(120061 ) also considered sample splitting in the multiple testing setting, 
but focused on a specific type of decision function for controlling the expected number of false 
positives. We avoid specifying the form of the decision function or error rate to be controlled 
here. Our goal is to develop a general J-^Q-uniform and J-^p-independent collection of 
compound p-value statistics, which can then be used to control many different error rates. 

Let T C M index a set of training data X[, T] and let T — J\f \ T index the set of test 
data X[, T\. Consider decision functions taking the form 

S m {X;r] m ) = 8 m (X[,T],X[m,T}; 
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Note that each decision function depends on different test data X[m,T], but also depends 
on the same training data X[, T]. Without loss of generality, we refer to the test data for 
H m o by Z m = X[m,T] and the training data by Y = X[,T], where Y m = X[m,T]. The 
following independence condition will be necessary for constructing F^-independent p- value 
statistics. 

Condition 1 The collection {(Y m , Z m ) : m £ .Mo} is a mutually independent collection of 
random observables, and is independent of the collection {(Y m , Z m ) : m £ .Mi}. 

We are now in a position to state Theorem EJ which allows for compound p- value statistics 
to be Fmq- uniform and J-^-independent. 

Theorem 3 Let A = (A m , m £ Ai) be a multiple decision process, where A m = {5 m (Y, Z m ; rj m ) : 
r) m £ [0, 1]} tests H m0 : F £ F m .o against H m \ '■ F £ T m \ for each m. If, for every F £ J r m o, 
E F (5 m (Y, Z m ;i] m )\Y) = r] m for every m £ M and r] m £ [0,1], then P A (Y, Z) is F Mo - 
uniform. If, in addition, Conditional is satisfied, then P^(Y, Z) is T ^-independent . 

It is important to emphasize that the decision processes, and hence p-value statistics, are 
allowed to be dependent under the alternative hypotheses. In fact, we will see that im- 
provements over the usual simple p-values will be made by constructing p-values that are 
dependent under the alternative hypotheses. 

4 Composite Hypotheses 

In this section we will develop compound p-value statistics for testing multiple pairs of 
hypotheses regarding location parameters. The strategy is to develop an J-^-independent 
compound multiple decision process, and then make use of Definition [1] and Theorem 3 to 
derive J-^-uniform and .F^-independent compound p-values. In what follows, we utilize 
Model 1 to develop the p-values, but results are not limited to this setting. This notion is 
discussed in more detail in Section 5. 

Assume that X has distribution function F £ F N where F N is Model [T] with mean vector 
jjfi and covariance matrix ~J. Here, we let the mean vector and covariance matrix depend 
on N so that, as we will see, the distribution of the sufficient statistics for the hypotheses of 
interest is free of N. The pairs of hypotheses are H m0 : F £ J-j^ = {F £ F N : /i m = 0} and 
H m i : F £ T^i = {F £ F N : fi m ^ 0} for each m. The collection of true null hypotheses 
is indexed by A^o = { m '■ = 0} and the collection of false null hypotheses is indexed by 
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Aii = {m : fi m 7^ 0}. We simplify our notation by writing vectors of sufficient statistics for 
fi with respect to the training data X[, T] and test data X[, T] by 

Y = Y J X\M andZ = ^X[,n], 
respectively. Denote the vector of sufficient statistics for the complete data by 

Note that Y ~ MVN(X 2 fi, X 2 I) and Z ~ MVN((1 - A 2 ) /Li, (1 - A 2 )/) where A 2 = |T|/|JV| 
is the proportion of training data and f — A 2 is the proportion of test data, and W ~ 
MVN(fi, I). 

To motivate our compound decision function, we first consider a simple decision function, 
which is allowed to depend on the unknown /x, rather than training data Y, and test data 
Z m . It is defined via 

S m (fj,, Z m ; rj m ) = I (^-j=== < l m {fj,, ?7 m )^ + I (^-j=== > u m (fi, r) m )^J (3) 

where l m (fJ>,r)m) = ^(Vmhmiv)) and u m (fi,r] m ) = - r] m [l - h m (fjt)]) are lower- 

and upper-tail cutoffs, respectively, $(■) is the standard normal distribution function, and 
h m : 9ft M — > [0,1] acts as a weight governing l m (^,T] m ) and u m (n,r} m ). Notice that when 



f^m — 0, Z m j yl A 2 has a standard normal distribution, and hence i?i?(5 m (/i., Z m \ Tj m 

)) = Vm 

for any h m {n). Since (Z m ,m G At) is an independent collection, A is an .F^-independent 
multiple decision process. 

Now, an Oracle, who knows [i, could choose h m (/j,) to maximize the power of 5 m , defined 

via 

/3 m (/x, A, r) m ) = E F [5 m (n, Z m ; r) m )] 

= $ (l m {tJl, T)m) ~ Vl - X 2 fJ, m J + 1 - $ (u m (n, rjm) ~ Vl ~ A 2 yU m ) , (4) 

thereby maximizing the average power 

P(t*>\ , n) = -rr X M^^Vm), ( 5 ) 

1 m£Mi 

were Mi = \Aii\ is the number of false null hypotheses. It can be verified that for each 
m G Aii and for a fixed A and i] = (r) m ,m G Ai), /3 m (/tz, A, r) m ), and hence A, rj), is 
maximized by defining h m (fi) = I(fi m < 0). Thus, the Oracle decision function is 

Z m , Vm ) = I < t T \^ Vm)} + / (j== > ^Vm, Vm) 



where iffl (pm, "Vm) = $ l (VmI(^m < 0)) and uZ\n m ,r] m ) = $ 1 (1 - 7] m [l - I(fi m < 0)]) 
are the lower-tail and upper-tail Oracle cutoffs arising by plugging in I(fi m < 0) for h m (fjt) 
in Imi^iVm) an d u m (n,r] m ) in express i on (151) . It should b e note d that other optimality 



criterion have been considered. 



Storey! (120071 ) and ISpjotvolll (119721 ) considered maximizing 
the e xpected number o f true positives (ETP), which can be written ETP = Mif3([i, A, rj), 



while 



PenaetaL 



(120 111 ) considered minimizing the expected number of "missed discoveries" 
or missed discovery rate (MDR), which can be defined by MDR = M\[l — A, 77)] = 
Mi — ETP. Both of these optimality criterion are satisfied by maximizing (3(/j,, A, 77). 
The Oracle p- values can be derived using Definition [TJ Writing 



< Z (or) (u n 

— m KH'my 'In 



< 0) 



and 



> Z (or) (u n 



' l-/(/i m <0) - Vn 



with a/0 = 00 for a > 0, it follows from Definition 1 that the Oracle p- value statistic for 
A^ r) = {^ r) (/i m , Z m ; rj m ) : ?7 m G [0, 1]} can be written as 



P A (or){li m ,z m ) = min 



Cf) ( Z ™ 



1 - $ 



/(/u m < 0) ' 1 - J(/i m < 0) 



(6) 



We make use of this particular expression to allow for a more straightforward comparison 
of the Oracle p-value and the compound p-value, which is presented next. It is important 
to note that since = (A { ° r \m G M) is an ^-^-independent MDP, P A ( D r)(/x, Z) = 

(P A ( r)(/i m , Z m ),m G M) is J r j^ -uniform and J r j^ -independent. 

Using training data Y to estimate I(fi m < 0) results in a compound decision function 



5£(Y,Z m , Vr , 



I 



— lm(Yj Tj r , 



+ 1 



z. 



> U m (Y,7] n 



where l m (Y,r) m ) = <fr~ l (ri m h m (Y)) and u m (Y,r) m ) = - r) m [l - h m (Y)]) are lower- and 

upper-tail cutoffs, respectively, and h m (Y) estimates I(fi m < 0). Arguments similar to those 
made above can be used to show that the compound p- value statistic for Am is 

$ ( _ ) l — $ ' Z ' 



mm 



h m {Y) 



h m (Y) 



(7) 



Sec 



Habiger and Penal (1201 ll ) for other forms of simple p-values for composite hypothesis 



testing. 
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Notice that given Y = y, if h m (y) = I(fi m < 0), then the compound and Oracle p- value 
statistics are equivalent. Hence, the goal will be to develop an h m (Y) that estimates I(/i m < 
0) "well". However, before proceeding, it is important to point out that these compound 
p-value statistics are .Fj^ -independent and F^ -uniform, regardless of the performance of 
h m (Y), and hence lead to valid multiple testing procedures. This result is formally stated in 
Corollary [TJ 

Corollary 1 Let M = {m G M : /i m = 0}. Then P A ( c) (V, Z) = (-P A («o(Y, Z m ),m G M) is 
J^j^ -uniform and J r ^ io -independent. 

Next, we develop a class of estimators of I(fi m < 0) using empirical Bayes ideas. Assume, 
for the moment, that fi m is random, and for m G Ai, let J m = I{p. m 7^ 0) be independent 
and identically distributed Bernouli random variables with success probability p. Note that 
if J m = 1, then H m Q is false. Further, assume that the distribution function for /i m , given 

G(nm\J m = l;0,r) = $ 



r 

Since F m |(,it m , J m = 1) ~ N(\ 2 [i m , A 2 ) and /i m |(J m = 1) ~ AT(0, r 2 ), we have that /i m |(F m = 
VmiJm = 1) has a normal dist ribution with mean (y m T 2 + 6)/(X 2 t 2 + 1) and variance 



r 2 /(A 2 r 2 + 1). See, for example, ICasella and Bergerl (120021 ). page 326. Thus, the posterior 



distribution function of /i m , given (Y m = y m , J m = 1), is 

'A 2 r 2 + 1 



G(n m \Y m = y m , J m = 1; 6, r) = $ 



_ y m r 2 + 
^ A 2 r 2 + 1 



r 2 



Here we condition on J m = 1 since, when J m = 0, ^(y, Z m ; rj m )] = rj m regardless of 

h m (Y), and since the goal is to maximize the power of a S m when fi m ^ 0. We should not 
be concerned with maximizing the power of 5 m when J m = since this would correspond to 
maximizing the probability of committing a type I error, i.e., making a false discovery. 

Since 9 and r are not known, the estimate of I(/A m < 0) given by h(y m , 0, r) = G(0|K m = 
ym, Jm — l;0? r ) is n ot yet computable. In an effort to develop easy-to-compute p-value 
statistics, we develop method-of-moments (MOM) estimators for these parameters. Still 
viewing (J m ,/i m ) as random, we get 

E(Y m ) = E(E(Y m \J m )) = p\ 2 6 

and 

Var(Y m ) = E(Var(Y m \J m )) + Var(E(Y m \ J m )) = A 2 + A 4 p(0 2 [l - p\ + r 2 ). 
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Setting these expressions equal to the sample mean y and sample variance s 2 of yi,V2, ■■■,Um, 
respectively, and solving for 9 and r yields the MOM estimates 

and 

^ X 2 -y 2 (l-p)/p 



max 



p\ 4 







Note that we set f 2 equal to whenever the solution yields a negative estimate of r 2 . 

Both of these MOM estimators now depend on the proportion of false null hypotheses 
p, and hence it is necessary to either specify or estimate p. In the next section, we will 
consider setting p — 1, and we will refer to resulting estimators of 9, r, and I(fi m < 0) as 
approximate minimax estimators since this specification corresponds to the assumption that 
all null hypotheses are false. Other possible specification of p will be considered in Section 
6. For now, we develop a class of MOM estimators for p using the fact that 

E[I(-e <Y m < e)] = (1 - p)A(e) + P B(e; 9, r) > (1 - p)A(e), (8) 

where 

A(e) = P(-e <Y m < e\J m = 0) = $(e/A) - $(-e/A) 

1). Making use of expression (JSJ) and sample moment 



and B(e; 9, 



P(-e<Y m <e\J„ 



If J2 m eM J (" e ^ Vm < e), we get 



M $(e/A) -$(-e/A) ' 
which no longer depends upon r or 0, but does depend on the tuning parameter e. This type 



of estimator has been studied in the mu ltip 
splitting setting. See 



estima t ors of p, see 



Efron et al. 



fl2002h . lStorev et al. 



in and Cail (120071 ) 



(2001) or 



e test i ng lite rature, though not in this sample 
o r example. For other t ypes of 



Efronl (120041) . 



Langaas et al. 



(I2005h 



Nettleton et al. 



(I2006h . 



Storey 



( 120041 ). among others. The choice of e will be considered in more detail 



in Sections 5 and 6. 
Finally, plugging 



X 2 p(y; e 
for 9 and r in G(0\Y n 



and f 2 (y) = max 



A 2 - y 2 [l -p(y;e)]/p(y;e) 







Vmi Jm 



h m (y) = $ 



p(y; e)A 4 

1; 9, t) yields the estimate of I(fi m < 0) given by 

y m r\y) + 9(y) 



(9) 



(10) 



^(^(A^C^ + l 

In the next section, we study how the choice of A 2 and the performance of h m (Y) affects the 
power of the compound and Oracle decision functions, and hence affects the performance of 
their corresponding p- value statistics. 
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5 Assessment 



5.1 Analytical Assessment 

To better understand the performance of the compound p-value statistic and ultimately 
determine how A 2 and e should be chosen, we first compare the power of the Oracle decision 
function to the usual simple decision function. The uniformly most powerful unbiased simple 
decision function, which does not split the data set but makes use of W m = Y m + Z m as test 
data, is defined via 

S$(W m] Vm ) = l{W m < t\r)m)) +l{W m > u^(r) m )) , 

where lm(r) m ) = $ _1 (?7 m /2) and Um(r] m ) = $ _1 (1 — ri m /2). The power of this simple decision 
function is 

^{Hm^m) = $ (liSiVm) - + 1 - $ (ujjfam) - /i m ) . 

From expression (J4]) and the definition of <5m^(/x, Z m ; r] m ), the power of the Oracle decision 
function is 

= $ (l { ° r \fi m , Vm) - VT^X 1 ^) + 1 " $ (u£\vm, T} m ) ~ VT^fl m ) 

The potential gain in power of the Oracle decision function over the simple decision function 
comes from the refinement of the upper-tail and lower-tail cutoffs. For example, suppose 
^ m = -l, Vm = .05, and A 2 = 0. Then, t^\-l,.0S) = -1.645 and ufe r) (-l, .05) = oo, 
while ^(.05) = -1.96 and u#(.05) = 1.96. Hence, (3t\-l,0, .05) = $(-1,645 + 1), while 
P${-1,.05) = $(-1.96 + 1) + [1 - $(1.96 + 1)] w $(-1.96 + 1). The Oracle decision 
function power is then larger than the simple decision function power since its lower-tail 
cutoff is -1.645 rather than -1.96. 

However, to implement the Oracle decision function, we must take A 2 > so that some 
data can be used to estimate the Oracle cutoffs. The potential loss in power as a result 
of only using (1 — A 2 ) 100% of the data as test data is manifested in the decreased Oracle 
effect size |yl — A 2 /i m ,|. For example, when \x m = —1 and A 2 = .4, then the effect sizes of 
the Oracle and simple decision functions are .6 and 1, respectively, and the resulting powers 
are approximately $(-1.96 + 1) = $(-.96) and $(-1,645 + .6) = $(-1,045), respectively. 
Hence, the refined cutoff's of the Oracle decision function could not compensate for the 
decreased effect size, and as a consequence the compound decision function will be less 
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powerful than the simple decision function. We more thoroughly examine this notion using 
Figure [H which depicts the regions of {(/i m ,A 2 )} where /3m r) (/i m , A 2 , r) m ) > /3m?(fJ, m ,7] m ) for 
several different values of r] m . We see that the Oracle decision function power is greater 
than the simple decision function power for larger values of A when \i m is near 0. Hence, 
the potential gain in power of the compound decision function is more pronounced in the 
frequently encountered low-power setting. It is important to emphasize that even if A 2 
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M 

Figure 1: The region j (fj, m , A 2 ) : f3 m yr \fi m , A, r) m ) > (3m\fi m , Vm)\ for r] m = 
.01, .001, .0001, .00001 is the area to the left of each curve. 

is chosen so that some Oracle decision functions are less powerful than the simple decision 
function, it may still be the case that the average power (computed via expression (J3J) of the 
Oracle decision functions is larger than the average power of the simple decision functions. 

We now examine the properties of h m (Y) and the power of the compound decision func- 
tion. The ideal setting is that for small A 2 , h m {Y) = I(fi m < 0) with probability 1. Then, it 
would follow from the definitions of Sm^ and 5m that 

^{H,\ 2 ,r)m) = E F [E F {5${Y,Z m ;r) m )\Y}] 

= E F [6fr\l*, Z m , Vm )] = /t or) (M, A 2 , Vm) 

In Theorem HI we see that this ideal scenario is achieved asymptotically (in t 
tests M) under the two-group model for any arbitrary choice of A 2 and e. See 




re number o f 
Efron (2008) 
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for a d i scussion rega r ding t his type of model, and iGenovese and Wassermanl (120021 ) , IStorey 
(120031 ). bin and Cail (120071 ). bomano and WolJ ( I2007f ). Isun and Cail (120071 ). among others, 
for other interesting asymptotic results in this two-group setting. Below, since we will let 
the number of tests M tend to oo, we write Y m = Y an d J m = J to indicate that the 



vectors have length M, and the notation "-4" and "A" means "converges in distribution" 
and "converges in probability" , respectively. 

Theorem 4 Suppose that E\Y m\J m] = A 2 /x A/ with fi M = QJ u for some nonzero scalar 
9 and J m & vector of independent and identically distributed Bernoulli random variables 
with success probability p G (0,1], and that CoviY m\J 



p., 



M\J Mi 



\ 2 I 



M ■ 



Suppose further that 
estimators of 9 and r in expression [Id]) are defined as in expression |PP and that 

S 2 (Y M ) A E[S 2 (Y M )} = A 2 + XWpil -p) 

is the sample variance of Y m- Then for any e > and 



as M — > oo, where S (Y M ) 
A 2 G(0,1], 



h m {Y M ) A 1(9 < 0) 



and P a( c)(Ym, Z m ) ->■ P A (. or )(fi m , Z m ) as M -> oo. 

Several important points should be made. First, Theorem H] holds for any fixed e > 0, 
and hence, at least for large M and under the two-group model, the choice of e becomes 
less of an issue. It should also be noted that Y u need not have a multivariate Normal 
distribution. It is only necessary that S 2 (Ym) consistently estimate the marginal variance 
of Y m . Finally, the compound p- value is J-j^ o -uniform and independent regardless of M. In 
the next subsection, we study the performance of the compound p- value when the two-group 
model is not satisfied, and h m {Y m) need not estimate I(fi m < 0) well. 

5.2 Simulation Study 

In this section, we compare the performance of the compound, Oracle, and simple p-values 



in terms of their ability to allow for multi ple testing procedures to 



Storey 



particular, w e consid e r the BH p rocedur e in iBenjamini and Hochberg] (119951 ) and the Q- value 



3e m ore powerful. In 



(120021 ) and IStorey! (120031 ). The procedures are defined as follows. Let 



procedure in 

P = (Pm,Tn ^ M) be a collection of p-values for testing H m0 vs. H m \ for m G M., and 
denote the ordered p- values by pm < p( 2 ) < ... < P(m)- F° r eacn P & i r of hypotheses, the BH 
decision function is 5 m ,BH(P', ex) = I{p m < olJbh(p)/M) where 



Jbh(p) = max jm G M. : p (m) < a— j . 
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The Q-value decision function is denned via S mi Q(p; a) = I(q m (p) < a), where q m (p) is the 
estimated g-value for the mth pair of hypotheses, defined via 



q m (p) = inf pFDR(j). 

7>Pm 



Here, pFDR(*-f) is the estimated positive False Discovery Rate {pFDR = E[V/R\R > 0]) 
incurred by rejecting all null hypotheses with a p-value less than or equal to 7. Hence, the 



est pos s ible p FDR allowing for the rejection of H, 



mO- 



Storey! (120021 ) . which were sh own to be co nservative 



g-value can be thought of as the smal 
Estimates of the pFDR proposed in 
in certain settings, are obtained using the R package q-value. See IStoreyi (120021 ) for more 
details. 

The important point is that the Q-value procedure is designed to control the pFDR at 
level a assuming tha t jo-values are independen t and uniformly distributed under the null 
hypotheses. Likewise, iBenjamini and HochbergJ (119951 ) show that the BH procedure controls 
the FDR = E[V/R\R > 0]P(R > 0) at level a^- < a under the independence and unifor- 
mity assumptions. Since the simple, Oracle, and compound p-values developed in this paper 
are all J-^-uniform and -independent, both procedures are valid when applied to any of 
these p-values. 

In our simulation, we considered the same model and hypotheses as in the last section 
with M = {1,2,. ..,5000}, M x = {1, 2, 1000}, and M = {1001, 1002, 5000}. For 
M = {1001, 1002, 5000}, /i m = 0. Hence, 20% of null hypotheses are false. For m G M u 
we take \x m = $ _1 (m/1001; 9, r), where $ _1 (-;6 l ,r) is the quantile function for a normal 
distribution with mean 9 and variance r . Hence, the the expected values of the 

order statistics from a normal distribution with mean 9 and variance r 2 , thereby allowing 
the location and spread of the signal, under the alternative hypotheses, to be governed by 
9 and r. Here, we will consider all combinations of 9 G {0,2,4} and r = { 0, 2}. Notice 



Sun and Cai 



that w hen 9 = 0, the /j, m s from false null hypotheses are symmetric about 0. 
(120071 ) showed that in this setting, and under a two-group model, simple p-values tend to 
yield efficient multiple testing procedures. When 9 is not 0, however, the signals are not 
symmetric about 0. Also, when r = 0, the two-group model is satisfied and Theorem 
4 is applicable. When r = 2, the two-group model is not satisfied, and it need not be 
the case that I(fi m < 0) is "well-estimated" by h m (y). For the kth replicated data set, 
vectors of training data and test data are generated according to ~ MVN(X 2 p,, X 2 I) 
and Z k ~ MVN((1 - \ 2 )n, (1 - A 2 )/), respectively. For k = 1,2,...,K = 1000, both 
procedures are applied to the collection of Oracle p- values computed as in (jSJ), and three 



14 



different collections of compound p- values in ([7]) computed by taking p(y; A), p(y; 2A), and 
p = 1. The choice of e = A and 2A , which is 1 and 2 standard deviations of Y m under H m0 , 
was recommended in lEfronl (120041 ) for this type of estimator. The usual simple p-values, 



which make use of all of the data W m = Y m + Z m as test data rather than just Z m , are 
computed via P A w{W m ) = 2[1 - $(|W m |)]. 

Both procedures were applied to all types of p-values for all data sets at a = .05. The 
average sample pFDR of the Q-value procedure was less than .05 for all configurations and 
p- value types. Similarly, the average sample FDR of the BH procedure was less than .05 for 
all configurations and p- value types. The average power of the BH procedure for a particular 
set of p- values and (9, recombination is estimated via 

i K r i 

k=l |_ 1 meMi 

The average power of the Q- value procedure is computed analogously. Results are presented 
in Table [TJ 

First, notice that when r = and the two-group model is satisfied, the power of a 
multiple testing procedure which makes use of the Oracle p- values is equivalent to the power 
of the procedure when using compound p- values for any choice of e or A 2 , just as Theorem 
4 predicted. Further, this power can be substantially larger than the power of the same 
multiple testing procedure that makes use of the simple p- values, especially in the low-power 
setting. For example, for A 2 = .01, 9 = 2, and r 2 = 0, the power of the Q- value procedure is 
increased by 83% when using the compound p- values (when using p(Y; e)) over the simple 
p-values (.22/. 12 = 1.83). The power of the Q-value procedure is increased by 80% (.18/.1 = 
1.8). This supports findings in the previous subsection (see Figure 1), where it was argued 
that the greatest potential for gain in power occurs when \x m is near 0. 

Likewise, as discussed in the previous subsection, when too much data is used as training 
data, Oracle p-values, and hence compound p-values, need not yield more powerful multiple 
testing procedures. For example, when A 2 = .2, the average power of the simple decision 
functions is greater than the average power of the Oracle decision functions in most settings 
(the exception being in the frequently encountered low power setting when 9 = 2 and r = 0). 
This scenario can and should be avoided in practice by choosing A 2 < .2. 

When r 2 = 2 and A 2 < .1 (note that the two-group model is not satisfied so that h m (y) 
need not estimate I(fi m < 0) well), we see that the compound p- values still result in more 
power than the usual simple p- values. The only exception is the setting when 9 = 0. However, 
the loss in power in this setting is small relative to the gain in power in the non-symmetric 
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Table 1: The average power of the BH and Q- value procedures when making use of simple 
p- values (A 2 = 0), Oracle p- values, and compound p- values where p is estimated with p(y; A), 
p(y; 2A), and 1. 





BH Procedure 






r 


= 




T = 2 




A 2 




9 = 2 


9 = 4 


9 = 


9 = 2 


9 = 4 





Simple 


0.10 


0.92 


0.16 


0.36 


0.72 


.01 


Oracle 


0.18 


0.95 


0.20 


0.40 


0.76 


.01 


P=l 


0.15 


0.94 


0.13 


0.37 


0.74 


.01 


p(v, -01) 


0.18 


0.95 


0.10 


0.38 


0.76 


.01 


v(v\ -02) 


0.18 


0.95 


0.09 


0.38 


0.76 


.05 


Oracle 


0.16 


0.94 


0.19 


0.39 


0.75 


.05 


P =i 


0.12 


0.93 


0.15 


0.36 


0.73 


.05 


p(y; .05) 


0.16 


0.94 


0.13 


0.37 


0.75 


.05 


pG/; - 1 ) 


0.16 


0.94 


0.12 


0.37 


0.75 


.10 


Oracle 


0.14 


0.93 


0.17 


0.38 


0.74 


.10 


p=l 


0.10 


0.92 


0.14 


0.35 


0.72 


.10 


P(y; -i) 


0.14 


0.93 


0.15 


0.36 


0.74 


.10 


P(y; -2) 


0.14 


0.93 


0.14 


0.36 


0.74 


.20 


Oracle 


0.10 


0.89 


0.15 


0.34 


0.71 


.20 


p=l 


0.07 


0.88 


0.12 


0.32 


0.70 


.20 


p(y; -2) 


0.10 


0.89 


0.13 


0.33 


0.71 


.20 


p(y; .4) 


0.10 


0.89 


0.13 


0.33 


0.71 




Q-value Procedure 





Simple 


0.12 


0.93 


0.16 


0.37 


0.74 


.01 


Oracle 


0.22 


0.96 


0.21 


0.42 


0.77 


.01 


p=l 


0.18 


0.95 


0.13 


0.38 


0.75 


.01 


p(y;.01) 


0.22 


0.96 


0.10 


0.39 


0.77 


.01 


p(j/; .02) 


0.22 


0.96 


0.10 


0.39 


0.77 


.05 


Oracle 


0.20 


0.95 


0.20 


0.41 


0.76 


.05 


p=l 


0.15 


0.94 


0.15 


0.37 


0.74 


.05 


p(j/; .05) 


0.20 


0.95 


0.14 


0.38 


0.76 


.05 


p(2/; - 1 ) 


0.20 


0.95 


0.12 


0.38 


0.76 


.10 


Oracle 


0.17 


0.94 


0.19 


0.39 


0.75 


.10 


p=l 


0.13 


0.93 


0.15 


0.36 


0.74 


.10 


p(2/; - 1 ) 


0.18 


0.94 


0.15 


0.36 


0.75 


.10 


p(2/; -2) 


0.18 


0.94 


0.15 


0.36 


0.75 


.20 


o 


0.13 


0.91 


0.16 


0.36 


0.73 


.20 


p=l 


0.09 


0.90 


0.13 


0.34 


0.71 


.20 


p(y; -2) 


0.13 


0.91 


0.14 


0.34 


0.72 


.20 


p(y; .4) 


0.13 


0.91 


0.14 


0.33 


0.72 
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settings, especially when a small portion of data are used as training data and the data from 
false null hypotheses are highly concentrated. 

In general, if less than 10% of the data is being used as training data, compound p- values 
will tend to lead to more powerful multiple testing procedures. The biggest gain in power 
occurs in the low-power setting when the signals (the fi m s) are identical. As the signals 
become more dispersed, less power is gained. 



5.3 Comparison to Other Compound Methods 

The sample splitting approach allows for more modeling assumptions regarding the joint 
behavior of the data, and at the same time enjoys a certain robustness property. To see why, 
first a discussion regarding relaxing assumptions from the previous sections is provided. 
Then, the methodology is compared to competing strategies. 

In general, one may compute a test statistic for test data via T m = T(X[m,T]), where 
T is some test statistic so that under H m o, T m ~ F. Then, Z m = $ _1 (F(T m )) has standard 
normal distribution (so long as F is continuous) under the null hypothesis by the probability 
integral transformation. Compound p- values can then be computed as in the previous section 
(with A 2 = 1). This is demonstrated in detail in the following section. Then, from Theorem 
|3l the resulting compound p-values will be uniformly distributed under H m0 : T m ~ F. If 
test data are independent under the null hypotheses, p- values will remain independent under 
the null hypotheses as well. Hence, regardless of the distribution of the test statistics under 
the alternative hypothesis, the applied multiple testing procedure, whichever is chosen, will 
be valid. It is only necessary that the appropriate test statistic be chosen so that T m does 
indeed have distribution function F under H m Q. For robust test statistics for multiple testing 



procedures see lHabiger and Penal (120111 ) . 



To better understand the samp 
based on the two-group model, 
assume that Z m ~ / = pf 



e splitting a p proac h, it is useful to first d iscuss procedures 



Efron et al. 



( 1200 lb . ISun and Cail (120071 ) . among others, 
1 — p)fi where fo is the density of Z m und er H m0 , fi the 
density of Z m under H m i, and p is a mixing proportion. ISun and Cail (120071 ) show that the 
Lfdr statistic, defined 

Lfdr(z m ) = Pf ° (Zm) , 

Pfo{z m ) + (1 -V)h{z)m 

can be used to control the FDR (asymptotically in M) so long as p G (0, 1) and p and f\ 
are consistent estimators. Since the validity of the procedure requires consistent estimation 
of fi, it is vital that a flexible model for fi be utilized, as is done in the above references. 
Added efficiency stems from the fact that the Lfdr statistic is proportional to the estimated 



17 



likelihood ratio statistic A(z m ) = y^(z m ). See lHabigerl (j201ll ) for details. The procedure is 



compound because joint behavior of the data is utilized, i.e. information is pooled, through 
the estimation of f\ with z±, Z2, Zm- The resulting decision rule, which can be written 
S(z) = [I(A(zi) > c), I(A(zm) > c)] for some cutoff c, is referred to as symmetric since 
for all permutation operators r, t(<5(z)) = <5(r(z)). 

In our example in Section 4, we allowed for data to vary according to a different dis- 
tribution under each alternative hypothesis. Specifically it was assumed that Z m ~ / = 
pfo + (1 — p)fm, where f m is an unknown normal density with mean u m . The result 
was a compound decision rule that depended upon M different likelihood ratio statistics 



A m (.Zm) 



In 



T,),m G A4, and hence need not be symmetric. We focused on the estima- 
tion of I(fi m < 0) since the form of the likelihood ratio statistic only depends upon this 
quantity in the normal setting. The joint behavior of the data was modeled by assuming 
that /i m ~ N(9,t), and informat ion is pooled by then allowing f m to depend upon all the 
training data via 9{y) and f(y). Storey ( 2007 ) also considered basing decision rules on M 
different normal models. 

The main difference between our approach and the aforementioned is that the information 
pooling is done using only training data, rather than all of the data, and that p-values for 
each decision function are provided. This sample spitting approach allows for valid p-values, 
even if the data are incorrectly modeled under the alternative hypothesis, and even if the 
number of tests M is small. For this reason, it is reasonable to base each Oracle decision 
rule on stronger modeling assumptions, as was done here. Further, by computing p-values 
for each test, any number of multiple testing procedures could be employed to control the 
error rate of interest, including but not limited to the FDR, pFDR, or FWER. 



6 Application to a Real Data Set 



Sineh et al 


( 


2002) 


i in 


Efron 


( 


2009 


)■ 



gene expression measurement from the nth microarray, where for n G Hi = {1, 2, 50}, 
microarray n is from an individual without prostate cancer and for n G A/2 = {51, 52, 102}, 
microarray n is from an individual with prostate cancer. The goal is to determine which 
genes, if any, are differentially expressed across treatment groups. 

We assume that X[m,n] iV(7 m ,cr^) for n G A/i and X[m,n] N(^ m + u m ,cr^) 
for n G A/2. The mth null and alternative hypotheses are H m0 : u m = 0, F m G J zNorm 
and H m i : u m 7^ 0, F m G J rAr ° rm ; where J :Norm is the collection of all normal distribution 
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Table 2: Depiction of a portion of the microarray data in ISingh et al.l (120021 ). where x[m, n] 
is the mth gene expression level from the nth individual. Data for the nth microarray is 
x[,n] and data for the mth gene is x[m, 







control 


group 




cancer 


group 




x[,l] 


x[, 2] 


... x[,50] 


x[,51] 


x[,52] 


... x[,102] 


x[l,] 


-.931 


-.840 


... 3.81 


-1.12 


1.01 


-.001 


x%\ 


-1.07 


-.880 


... -.477 


-.571 


-.811 


-.836 


x[6033,] 


-.754 


-.708 


... -.011 


.457 


.578 


-.162 



functions. 

We present the form of the compound and simple p- value statistics. Here, T = 7\ U T 2 , 
where T\ and T 2 index training data from control and treatment groups, respectively, and 
T = T\ UT2, where T\ = A/i \ T\ and T2 = A/2 \7a index test data from control and treatment 
groups, respectively. For this data, since the simulation studies from the previous section 
suggest that between 1 and 10 percent of data should be used as training data, we (randomly) 
select 4 of our 102 microarrays as training data (T x = {10, 22} and T 2 = {60, 88}). The two 
sample T-test statistic for H m Q based on test data X[m,T] is 

Enef 2 X K n]/rif 2 - J^nen X K n \l n f x 



T m (X[m,T\) 



1 



1 

l f 2 



where Ua = \A\ and s pm is the pooled sample standard deviation of X[m, T\] and X[m, T2]. 
To remain consistent with notation in the previous sections, we transform T m via Z m = 
$ _1 (7^y-2(T m (X[m, T})) so that Z m ~ N(0, 1) under H mQ by the probability integral trans- 
formation. In a similar fashion, we transform the training data via Y m = 
^ > ~ 1 (7^ T _2(T m (X[m, T]))), where T m (X[m, T]) is Student's two-sample T-test as above but 
computed on X[m,Ti] and X[m, T2]. It is important to note that since A 2 is now fixed, we 
do not parameterize our test data and training data to have mean and variance that depends 
on A 2 . The compound decision function can then be defined via 



1 a z m <^\ Vm h m {Y)) 

1 if Z m >^- 1 (l-r ]m [l-h m (Y)}) 
otherwise, 



where Y = (Yj., Y 2 , Ym)- It can be verified using arguments from Section 4 that the 
compound p- value can be written as in expression (171) , and that h m (Y) should estimate 
I(/J>m ^ 0). Hence, we define h m (Y) as in ffTOl) with A 2 = 1 since Var(Y m ) = 1. For the 
compound p-values, we consider taking e = 1 and 2 in p(y; e) since this corresponds to 1 
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and 2 standard deviations of Y m under H m Q. We also take p = .1 as in lEfronl (120091 ) and 
p = 1 as in the previous section. The usual two sample T-test p-values were computed via 
P.( s )(X[m, ]) = 2[1 — 7ioo(|^(^[ m 7 ])|)j where T(X[m, ]) is the two sample T test statistic 
as above but with 7\ = A/i and T 2 = jV 2 indexing all of the data from control and treatment 
groups. 
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Q-value Procedure 
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Figure 2: The number of discoveries when applying the BH (top) and Q-value (bottom) 
procedures to simple p-values (x) and compound p- values when p is estimated with p(y; 2) 
(o), assumed to be .1 (A), and assumed to be 1 (+). 



The number of discoveries made by the BH and Q-value procedures when applied to each 
of the different collections of p- values at levels a = .01, .02, .2 are presented in Figure |2j 
Results when compound p- values made use of p(y; 1) are not presented because we get a 
negative estimate of p. Such estimates are not uncommon wh en p and e a re near due to 
the fact that the bias of p(Y; e) is negligible in this setting. See lEfronl ( 120041 ) for a discussion 
regarding this issue. We see that when making use of any of the compound p-values, rather 
than the simple p-values, both procedures always make at least as many or more (sometimes 
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substantially more) discoveries. For example, when the BH procedure is applied at a = .2 to 
compound p- values with p(y; 2) = .017, 15 discoveries, rather than 3, are made. For a — .1, 
the compound p- values which assume p = .1 and p = 1 allow for the BH procedure to make 
5 and 6 discoveries, respectively, while the use of the simple p-values leads to discoveries. 
Results are similar for the Q- value procedure in that compound p- values always allow for at 
least as many discoveries, and sometimes allow for substantially more discoveries. 



7 Concluding Remarks 

Recent multiple testing research has established that compound multiple testing procedures 
are typically more efficient than simple multiple testing procedures. In this paper, we have 
shown that these multiple testing procedures can be made even more efficient by making 
use of compound test statistics. We have limited our study to compound p-value statistics, 
largely due to the fact that a substantial number of multiple testing procedures make use of 
p-value statistics, thus making results in this paper widely applicable. 

Here, the data were split into training and test data, and only training data (as opposed 
to all the data), were utilized to borrow information across tests. The main advantage of this 
data-splitting approach over the usual double dipping approach is that validity of the result- 
ing p-values and multiple testing procedure is guaranteed, even if data are poorly modeled 
under the alternative hypotheses, and even for a small number of tests M. Intuition suggests 
that the disadvantage of this approach is that in some settings efficiency will be sacrificed 
since less data is utilized to estimate parameters governing the form of the Oracle decision 
rule. A more thorough comparison of this approach an d the double dipp ing approach is 



warranted, but is beyond the scope of this paper. See also lPena et al.l ( 120111 ) for a discussion 
on this issue. 

The examples in this paper could likely be improved upon by considering other types of 
models for the joint behavior of the data, as well as other type of estimators. Method of 
moment estimators were utilized to allow for easy-to-compute p-values. 

The assumption that test statistics are independent under the null hypotheses may not 
be satisfied in practice. In this setting, we cannot expect compound or simple p-value statis- 
tics to be independent under the null hypotheses. However, many p-value based multiple 
testing procedures, including some of those mentioned in the Introduction, do not require 
the independence condition to be satisfied. Results in Sections 2 and 3 can still be used to 
develop compound p-value statistics satisfyin g the uniformity condition, which can t h en be 



used in these multiple testing procedures. See 



Benjamini and Yekutielil (120011 ) 



Sarkarl (12002 
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20071 ); ISun and Cail ( 120091 ) for more on relaxing the independence condition. 

In closing, we reiterate the intent in this paper is not to develop a new compound multiple 
testing procedure, but rather to develop compound p-value statistics for use in existing 
multiple testing procedures. We have only studied the effects of compound p- value statistics 
on two compound multiple testing procedures, but we suspect that most multiple testing 
procedures will behave in a more efficient manner if they are used in conjunction with 
compound, rather than simple, p-value statistics. 

8 Appendix: Proofs 

Proof of Theorem [It It suffices to show that PA m (X) is Jv^-uniform for every m G 
M.q. But since sup Fc y m0 Ef (6 m (X; r? m ) ) = r\ m for every r\ m G [0, 1], the result follows from 
Theorem 2.3 in lHabiger and Penal (1201 ll ) by taking X m = X. 

Proof of Theorem [2} Suppose we could show that P p{5 m {X\ t m ) = I(P& m (X) < 
t m )) = 1 for every F G F, t m G [0, 1], and m G Ai. Then it will follow that 

Pf ( f| [5 m (X;t m ) = I(P Am (X) <t m )}\ 

\m.eM / 

= 1 - P F ( |J [S m (X- t m ) ^ I(PaJX) < t m )} J 

> 1 - PF(S m (X;t m ) + I(PaJX) < t m )) 
= 1-0 = 1, 

which will imply that P F (C\ meM [PA m (X) < U) = P F (f] mGM [6 m (X; t m ) = 1]) . The result 
will then follow from equations ([1]) and (j2J). Therefore, it suffices to show that P F (5 m (X; t m ) = 
I(P Am (X)<t m )) = l. 

Fix F G F. There exists a null set iV C X such that for every x G N c , t m i— > S m (x; t m ) 
is right-continuous and nondecreasing with Pf{X G N c ) = 1. Fix an x G N c . If a G 
{t m : S m (x;t m ) = 1}, then inf{t m : 5 m (x;t m ) = 1} < a implying that PA m { x ) < a - Hence, 
{t m '■ S m (x;t m ) = 1} C {t rn : PA m (x) < t m } by Definition [Q Next, suppose that a G 
{t m '■ PA m (x) < t m }. Since <5 m (x;t m ) is right-continuous and nondecreasing, 5 m (x]a) = 1, 
so that a G {t m : 5 m (x;t m ) = 1} and {t m : 5 m (x;t m ) = 1} D {t m : PA m (:r) < t m }. That 
is, 5 m (x;t m ) = I(PA m ( x ) — t m ) for every x G iV c . Since Pi?(A rc ) = 1, it follows that 
P F (5 m (X;t m ) = I(PaJX) < t m )) = 1. 

Proof of Theorem [3} Theorem [1] ensures that P^(Y, Z) is J-^Q-uniform since A is 
a decision process. From Theorem [21 if A is ^^-independent, then P^(Y,Z) is Fm ~ 
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independent. Hence, it suffices to show that 



n 1m) ^m)!^ 



P F {^m£M \&m(Y-i Z m \ 1m) ~ d m \) 

= Pp(C] meMl [Sm(y, Z m ] r] m ) = d m ]) P F (5 m (Y, Z\ Tim) = dm). 

m&Mo 

But, since P F (5 m (Y, Z m ; rj m ) = d m \Y) = k m (lm) for m G M , where 

k m (lm) = 1ml{dm = 1) + (1 - 7] m )I(d m = 0), 

then by the conditions of the theorem and using the laws of iterated expectations, we get 
P F I p| [6 m (Y, Z m ; rjm) = d m ] I = E F < P F I p| [5 m (Y, Z m \ rj m ) = d m ]\Y 

\m£M / I \meM 

= E F (P F ( P [6m(Y,Zm;im)=d m ]\Y\ ( J] P F (6 m (Y,Z, 
\ \meMi / \m€Mo 

= E F (p f ( p [8 m (Y,Zm) = dm]\Y \ J \\ k m (lm) 
\ \m£Mi / / m£Mo 

— Pf I P [$m(Y, Z m ; r/m) = dm] I P F (5 m (Y J Z m ; rj m ) = d m ). 

\meMi / m£Mo 

Proof of Corollary [Tl Since Condition 1 is satisfied, by Theorem [3] it is sufficient to 
show that for every m G M and F G Fj^ , E F [6$(Y, Z m \ r] m )\Y] = rj m for any rj m G [0, 1]. 
But if m G Mo, 

E F [8$(y,Z m ;7i)\Y] = E F [$(l m (Y,r)m)) + l-$(u m (Y,r) m ))] 

= Ep m^^mhmiY)) + 1 - ^(^(l - Vm [l - h m (Y)]))] 

= E F [vmh m (Y) + 1 - (1 - Vm [l - h m {Y)])] 

= 1m[h m (Y) + 1 - h m (Y)] = 7] m 

for any rj m G [0, 1]. 

Proof of Theorem [4} First, suppose that m G Mo- Then it follows from Theorem 
□ and the fact that E F (5)r?(Y M ,Z m ;r} m )) = r] m and E F {5m (fJ>m, Z m ;rj m )) = 1m for every 
rjm G [0, 1], that P. (or) (u, m , Z m ) = U = P A (<o (Ym, Z m ) where = means "equal in distribution" 
and U is a uniform random variate. Now, for m G M\ = M \ Mo = {m : \x m = 9}, 
if h m (Y m) A /(^ < 0) a s M — >■ oo, then the Continuous Mapping Theorem (see, for 
example, page 19 in Serfling ( 1980 )) and expressions © and (171) imply that P A («o (Fm, -Zm) — > 
P A ( r)(/i m , Z m ). Hence, it suffices to show that h m (Y m) —> HQ < 0). To do so, we show that 



0(Y M ) 4 fc0 
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for some k > and 

r 2 (Y M ) A 0, 

since these results, together with the Continuous Mapping Theorem, and writing 

-Y m §(Y M ) 



(12) 



hm(Y M ) 



^\ 2 + \/t 2 {Y m ) r 2 (Y M )(\ 2 r 2 (Y M )) 

imply h rn (Y M ) A $(-sign(6)oo) = 1(9 < 0). 

To show (jlip . first note that by the inequality in expression 

I(-e < Y m < e) 



< E 



p < p. 



(13) 



$(e/A) -$(e/A). 

Hence, by the definition of p(Kjv,/; e ) an d the weak law of large numbers (WLLN), p(Y M ; e) A 
p*. Similarly, since Var(Y m ) < oo, by the WLLN we have Y M /(X 2 p) A 9. Hence, 



6{Y 



M 



M , 



V 



M 



V 



A 0— . 



A 2 ]5(l^ M ;e) VA 2 p/ Vp(^A/;e)y p* 

To show (I12p . first note that #(1^m) 2 A 9 2 p 2 /(p*) 2 since g(x) = a; 2 is continuous. From 
the continuous mapping theorem and since p/p* > 1 and (1—p*) > (1—p) by the inequality 
in (USD, 



A 2 + A 4 ^(r A/ ) 2 p(r M ;e)(l-p(r A/ ;e)) A A 2 + \ 4 9 2 j^^p*(l - p* 



\ 2 + \*9 2 p P-)(l-p*] 



> X 2 + XW 2 p(l -p). 
Since S 2 (Y M ) A E[S 2 (Y M )} = A 2 + A 4 # 2 p(l -p), the above result implies 



S 2 (Y M ) - 



for some c. Hence, 



S 2 (Y M ] 



A 2 + A 4 ^(^ M ) 2 p(r M ; e)(l - p(Y M] e)) 



A 2 + \ 4 9(Y M fp(Y M] e)(l - p(F M ; e)) 



Ac<0 



A-^<o 



so that 



f 2 (l^Af) = max 



A 4 p(^ M ;e) ' AV 

S^m) - [A 2 + A 4 p(F A/ ; e)(l - p(Y M ; e))] 



40. 
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